15 research outputs found
Improving Facial Attribute Prediction using Semantic Segmentation
Attributes are semantically meaningful characteristics whose applicability
widely crosses category boundaries. They are particularly important in
describing and recognizing concepts where no explicit training example is
given, \textit{e.g., zero-shot learning}. Additionally, since attributes are
human describable, they can be used for efficient human-computer interaction.
In this paper, we propose to employ semantic segmentation to improve facial
attribute prediction. The core idea lies in the fact that many facial
attributes describe local properties. In other words, the probability of an
attribute to appear in a face image is far from being uniform in the spatial
domain. We build our facial attribute prediction model jointly with a deep
semantic segmentation network. This harnesses the localization cues learned by
the semantic segmentation to guide the attention of the attribute prediction to
the regions where different attributes naturally show up. As a result of this
approach, in addition to recognition, we are able to localize the attributes,
despite merely having access to image level labels (weak supervision) during
training. We evaluate our proposed method on CelebA and LFWA datasets and
achieve superior results to the prior arts. Furthermore, we show that in the
reverse problem, semantic face parsing improves when facial attributes are
available. That reaffirms the need to jointly model these two interconnected
tasks
On Symbiosis of Attribute Prediction and Semantic Segmentation
In this paper, we propose to employ semantic segmentation to improve
person-related attribute prediction. The core idea lies in the fact that the
probability of an attribute to appear in an image is far from being uniform in
the spatial domain. We build our attribute prediction model jointly with a deep
semantic segmentation network. This harnesses the localization cues learned by
the semantic segmentation to guide the attention of the attribute prediction to
the regions where different attributes naturally show up. Therefore, in
addition to prediction, we are able to localize the attributes despite merely
having access to image-level labels (weak supervision) during training. We
first propose semantic segmentation-based pooling and gating, respectively
denoted as SSP and SSG. In the former, the estimated segmentation masks are
used to pool the final activations of the attribute prediction network, from
multiple semantically homogeneous regions. In SSG, the same idea is applied to
the intermediate layers of the network. SSP and SSG, while effective, impose
heavy memory utilization since each channel of the activations is pooled/gated
with all the semantic segmentation masks. To circumvent this, we propose
Symbiotic Augmentation (SA), where we learn only one mask per activation
channel. SA allows the model to either pick one, or combine (weighted
superposition) multiple semantic maps, in order to generate the proper mask for
each channel. SA simultaneously applies the same mechanism to the reverse
problem by leveraging output logits of attribute prediction to guide the
semantic segmentation task. We evaluate our proposed methods for facial
attributes on CelebA and LFWA datasets, while benchmarking WIDER Attribute and
Berkeley Attributes of People for whole body attributes. Our proposed methods
achieve superior results compared to the previous works.Comment: Accepted for publication in PAMI. arXiv admin note: substantial text
overlap with arXiv:1704.0874
Human semantic parsing for person re-identification
Person re-identification is a challenging task mainly dueto factors such as background clutter, pose, illuminationand camera point of view variations. These elements hinder the process of extracting robust and discriminative representations, hence preventing different identities from being successfully distinguished. To improve the representation learning, usually local features from human body partsare extracted. However, the common practice for such aprocess has been based on bounding box part detection.In this paper, we propose to adopt human semantic parsing which, due to its pixel-level accuracy and capabilityof modeling arbitrary contours, is naturally a better alternative. Our proposed SPReID integrates human semanticparsing in person re-identification and not only considerably outperforms its counter baseline, but achieves stateof-the-art performance. We also show that, by employinga simple yet effective training strategy, standard populardeep convolutional architectures such as Inception-V3 andResNet-152, with no modification, while operating solelyon full image, can dramatically outperform current stateof-the-art. Our proposed methods improve state-of-the-artperson re-identification on: Market-1501 [48] by ~17% inmAP and ~6% in rank-1, CUHK03 [24] by ~4% in rank-1and DukeMTMC-reID [50] by ~24% in mAP and ~10% inrank-1.Computer Vision FoundationWOS:000457843601020Scopus - Affiliation ID: 60105072Conference Proceedings Citation Index- ScienceProceedings PaperHaziran2018YÖK - 2017-1
NMF-KNN: Image Annotation Using Weighted Multi-view Non-negative Matrix Factorization
The real world image databases such as Flickr are char-acterized by continuous addition of new images. The recent approaches for image annotation, i.e. the problem of as-signing tags to images, have two major drawbacks. First, either models are learned using the entire training data, or to handle the issue of dataset imbalance, tag-specific dis-criminative models are trained. Such models become ob-solete and require relearning when new images and tags are added to database. Second, the task of feature-fusion is typically dealt using ad-hoc approaches. In this paper, we present a weighted extension of Multi-view Non-negative Matrix Factorization (NMF) to address the aforementioned drawbacks. The key idea is to learn query-specific genera-tive model on the features of nearest-neighbors and tags us-ing the proposed NMF-KNN approach which imposes con-sensus constraint on the coefficient matrices across different features. This results in coefficient vectors across features to be consistent and, thus, naturally solves the problem of feature fusion, while the weight matrices introduced in the proposed formulation alleviate the issue of dataset imbal-ance. Furthermore, our approach, being query-specific, is unaffected by addition of images and tags in a database. We tested our method on two datasets used for evaluation of image annotation and obtained competitive results. 1
Nmf-Knn: Image Annotation Using Weighted Multi-View Non-Negative Matrix Factorization
The real world image databases such as Flickr are characterized by continuous addition of new images. The recent approaches for image annotation, i.e. the problem of assigning tags to images, have two major drawbacks. First, either models are learned using the entire training data, or to handle the issue of dataset imbalance, tag-specific discriminative models are trained. Such models become obsolete and require relearning when new images and tags are added to database. Second, the task of feature-fusion is typically dealt using ad-hoc approaches. In this paper, we present a weighted extension of Multi-view Non-negative Matrix Factorization (NMF) to address the aforementioned drawbacks. The key idea is to learn query-specific generative model on the features of nearest-neighbors and tags using the proposed NMF-KNN approach which imposes consensus constraint on the coefficient matrices across different features. This results in coefficient vectors across features to be consistent and, thus, naturally solves the problem of feature fusion, while the weight matrices introduced in the proposed formulation alleviate the issue of dataset imbalance. Furthermore, our approach, being query-specific, is unaffected by addition of images and tags in a database. We tested our method on two datasets used for evaluation of image annotation and obtained competitive results
Improving Facial Attribute Prediction Using Semantic Segmentation
Attributes are semantically meaningful characteristics whose applicability widely crosses category boundaries. They are particularly important in describing and recognizing concepts where no explicit training example is given, e.g., zero-shot learning. Additionally, since attributes are human describable, they can be used for efficient humancomputer interaction. In this paper, we propose to employ semantic segmentation to improve facial attribute prediction. The core idea lies in the fact that many facial attributes describe local properties. In other words, the probability of an attribute to appear in a face image is far from being uniform in the spatial domain. We build our facial attribute prediction model jointly with a deep semantic segmentation network. This harnesses the localization cues learned by the semantic segmentation to guide the attention of the attribute prediction to the regions where different attributes naturally show up. As a result of this approach, in addition to recognition, we are able to localize the attributes, despite merely having access to image level labels (weak supervision) during training. We evaluate our proposed method on CelebA and LFWA datasets and achieve superior results to the prior arts. Furthermore, we show that in the reverse problem, semantic face parsing improves when facial attributes are available. That reaffirms the need to jointly model these two interconnected tasks
How To Take A Good Selfie?
Selfies are now a global phenomenon. This massive number of self-portrait images taken and shared on social media is revolutionizing the way people introduce themselves and the circle of their friends to the world. While taking photos of oneself can be seen simply as recording personal memories, the urge to share them with other people adds an exclu-sive sensation to the Selfies. Due to the Big Data nature of Selfies, it is nearly impossible to analyze them manually. In this paper, we provide, to the best of our knowledge, the first Selfie dataset for research purposes with more than 46,000 images. We address interesting questions about self-ies, including how appearance of certain objects, concepts and attributes inuences the popularity of Selfies. We also study the correlation between popularity and sentiment in Selfie images. In a nutshell, from a large scale dataset, we automatically infer what makes a Selfie a good Selfie. We be-lieve that this research creates new opportunities for social, psychological and behavioral scientists to study Selfies from a large scale point of view, a perspective that best fits the nature of the Selfie phenomenon. Categories and Subject Descriptors: I.4 [Image Pro-cessing and Computer Vision]: Applications
Ucf-Crcv At Trecvid 2015: Semantic Indexing
This paper describes the system we used for the main task of Semantic INdexing (SIN) at TRECVID 2015. Our system uses a five-stage processing pipeline including feature extraction, pooling, encoding, classification and reranking. We employed CNN-based representations, as well as Dense and Root SIFTs as features for our system. We also report results of our experiments with SentiBank features and data augmentation techniques that did not contribute to the performance of the final system. Our second run ‘Rostam’ achieved an infAP of 26.67% on the 30 concepts evaluated for SIN 2015